SPARQL over GraphX
نویسنده
چکیده
The ability of the RDF data model to link data from heterogeneous domains has led to an explosive growth of RDF data. So, evaluating SPARQL queries over large RDF data has been crucial for the semantic web community. However, due to the graph nature of RDF data, evaluating SPARQL queries in relational databases and common data-parallel systems needs a lot of joins and is inefficient. On the other hand, the enormity of datasets that are graph in nature such as social network data, has led the database community to develop graph-parallel processing systems to support iterative graph computations efficiently. In this work we take advantage of the graph representation of RDF data and exploit GraphX, a new graph processing system based on Spark. We propose a subgraph matching algorithm, compatible with the GraphX programming model to evaluate SPARQL queries. Some experiments are performed to show the system scalability to handle large datasets.
منابع مشابه
S2X: Graph-Parallel Querying of RDF with GraphX
RDF has constantly gained attention for data publishing due to its flexible data model, raising the need for distributed querying. However, existing approaches using general-purpose cluster frameworks employ a record-oriented perception of RDF ignoring its inherent graph-like structure. Recently, GraphX was published as a graph abstraction on top of Spark, an in-memory cluster computing system....
متن کاملGraphX: Graph Processing in a Distributed Dataflow Framework
In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recover...
متن کاملGraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster ...
متن کاملReasoning over SPARQL
Until now, the sparql query language was restricted to simple entailment. Now sparql is being extended with more expressive entailment regimes. This allows to query over inferred, implicit knowledge. However, in this case the sparql endpoint provider decides which inference rules are used for its entailment regimes. In this paper, we propose an extension to the sparql query language to support ...
متن کاملHow Interlinks Influence Federated over SPARQL Endpoints
As the Web of Data grows, the number of available SPARQL endpoints increases. SPARQL endpoints conceptually represent RPC-style, coarse-grained data access mechanisms. Nevertheless, through the potential interlinking of the contained entities, SPARQL endpoints should be able to over distinct advantages over plain Web APIs. To our knowledge, to date, there has been no study conducted that gauges...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1701.03091 شماره
صفحات -
تاریخ انتشار 2017